Introduction to R and R studio

Scatterplot with birth data:

library(mosaic)
library(mosaicData)
xyplot(births ~ dayofyear, data = Births78)

Other commands we’ve run during the workshop day 1 and some comments on them.

histogram(~ age, data = HELPrct) #histogram of age

favstats(~ age, data = HELPrct) #My favorite statistics; it ignores NAs
##  min Q1 median Q3 max     mean       sd   n missing
##   19 30     35 40  60 35.65342 7.710266 453       0
tally(~sex, data = HELPrct) #Count of gender
## 
## female   male 
##    107    346
tally(~sex, format = "percent", data = HELPrct) #Percents of gender
## 
##   female     male 
## 23.62031 76.37969
tally(~sex, format = "proportion", data = HELPrct) #Proportions of gender
## 
##    female      male 
## 0.2362031 0.7637969
tally(~substance, data = HELPrct) #Count of substance
## 
## alcohol cocaine  heroin 
##     177     152     124
tally(~substance, format = "perc", data = HELPrct) #Percents of substance
## 
##  alcohol  cocaine   heroin 
## 39.07285 33.55408 27.37307
tally(sex ~ substance, data = HELPrct) #Cross-tab sex & substance
##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94
tally(~ sex + substance, data = HELPrct) #Ditto, just a different format
##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94

Also, always remember the mplot command for producing graphs and clicking on the ‘Show Expression’.

Data Structures and Tidy Data

Consider cases and variables

In tidy data:

All of this goes into a codebook only.

Visualizing Data

DTK: * Glyphs are marks; data glyps are also marks and the features of the glyphs encode the values of the variables & the visual properties are aesthetics. The choices that we make as experts in our field is the choice of aesthetic to map to variables. The word ‘aesthetic’ here is taken from its early When we make data glyps we map variables to aesthetics. A scale is for a computer and a guide is for a person (about the aesthetics). A legend (beside a graph) is an example of a guide. A data table is glyph-ready when there is one row for each glyp to be drawn (I could say, “that’s x-position, that’s y-position, that’s color, that’s size.” Glyph-ready data are tidy data, but tidy data are not necessarily glyph-ready. *Sometimes glyphs represent the collective properties of variables, e.g. in the case of histograms.

RP:

require(lubridate)
data(Births78)
head(Births78, 3)
##         date births dayofyear
## 1 1978-01-01   7701         1
## 2 1978-01-02   7527         2
## 3 1978-01-03   8825         3
ggplot(data = Births78, aes(x = date, y = births)) + geom_point()

But, we need to add days of the week because that’s more useful to us:

Births78 <- 
  Births78 %>% 
  mutate(wday = wday(date, label = TRUE))
  ggplot(data = Births78, aes(x = date, y = births, color = wday)) + geom_point() 

Note, that the same graph would be generated by the following:

  ggplot(data = Births78) + geom_point(aes(x = date, y = births, color = wday)) 

We could change this to a line graph:

 ggplot(data = Births78) + geom_line(aes(x = date, y = births, color = wday)) 

Or we could have points and lines; note that we have moved the aes commands into the ggplot command because it applies to all the layers.

 ggplot(data = Births78, aes(x = date, y = births, color = wday)) + geom_line() + geom_point()

We could have put the data outside the ggplot command using the magrittr pipe:

 Births78 %>%
  ggplot(aes(x = date, y = births, color = wday)) + geom_line() + geom_point()

We need to do setting rather than mapping for things like colors; inside the individual geom you set the color you want (different to ggvis where setting is done with :=). Within ggplot you can only map nor set.

 Births78 %>%
  ggplot(aes(x = date, y = births)) + geom_point(color = "navy")

Combine the colored lines with navy points - notice that wday is in the aes in the geom_line whereas with geom_point we don’t call on aes because we’re simply setting (not mapping).

 Births78 %>%
  ggplot(aes(x = date, y = births)) + 
    geom_line(aes(color = wday)) + 
    geom_point(color = "navy")

Recall that we can check out the kinds of geoms that exist if we use the command apropos, notice the use of the caret (^)

apropos("^geom")
##  [1] "geom_abline"     "geom_area"       "geom_bar"       
##  [4] "geom_bin2d"      "geom_blank"      "geom_boxplot"   
##  [7] "geom_contour"    "geom_crossbar"   "geom_density"   
## [10] "geom_density2d"  "geom_dotplot"    "geom_errorbar"  
## [13] "geom_errorbarh"  "geom_freqpoly"   "geom_hex"       
## [16] "geom_histogram"  "geom_hline"      "geom_jitter"    
## [19] "geom_line"       "geom_linerange"  "geom_map"       
## [22] "geom_path"       "geom_point"      "geom_pointrange"
## [25] "geom_polygon"    "geom_quantile"   "geom_raster"    
## [28] "geom_rect"       "geom_ribbon"     "geom_rug"       
## [31] "geom_segment"    "geom_smooth"     "geom_step"      
## [34] "geom_text"       "geom_tile"       "geom_violin"    
## [37] "geom_vline"
HELPrct %>% 
ggplot(aes(x = substance)) + 
geom_bar()

Notice that we were able to construct the bar chart even though our data weren’t glyph-ready with counts; ggplot did it for us.

HELPrct %>% 
    ggplot(aes(x = age)) + 
    geom_histogram(binwidth = 2)

* We also often want to use frequency polygons or kernel density functions:

HELPrct %>% 
ggplot(aes(x = age)) + 
    geom_freqpoly(binwidth = 2)

HELPrct %>% 
ggplot(aes(x = age)) + 
    geom_density()

But, note RP prefers to add density to a line plot because it look

HELPrct %>% 
ggplot(aes(x = age)) + 
geom_line(stat = "density")

Or we could have put the geom in the stat_density

HELPrct %>%
ggplot(aes(x=age)) +
stat_density( geom="line")
## ymax not defined: adjusting position using y instead

Now generate your own graph looking at the average consumption of graphs (which I did by groups)

HELPrct %>% 
    ggplot(aes(x = i1)) + 
    geom_line(stat = "density", aes(color = factor(substance)))

Teaching Tips

These are tips I picked out (thus idiosyncratic):

Commonplace Book

Working with some of my own data

Importing the medicare data…

#medicare <- read.csv("/Users/shalliday/Google Drive/simondhalliday.github.io/cvc_workshop/medicare_fy2013.csv")
medicare <- read.csv("medicare_fy2013.csv")
#head(medicare)
#str(medicare)

Trying to embed a badly thought-out plot using the ggvis package:

library(dplyr)
library(ggvis)
medicare %>% 
  ggvis(~factor(Provider.State), ~Average.Covered.Charges) %>%
  layer_bars() %>%
   add_axis("x", properties = axis_props(
    labels = list(angle = 45, align = "left", fontSize = 10)
   ))